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Introduction 

Consider  a  class  Jt  of  probability  measures  on  a  measurable  space  X  and 
measurable  functions  gj  and  h  on  X.  In  a  typical  moment  problem  we  want 
further  information  on  the  possible  sets  of  values  taken  by  the  integrals  g  (gj) 
and  g(h)  as  g  runs  through  Jt.  And  the  main  purpose  of  the  present  paper  is  to 
develop  into  more  systematic  methods  certain  principles  which  in  special  cases 
have  been  found  effective  for  handling  such  moment  problems. 

In  Sections  2  through  4  we  take  up  certain  frequently  occurring  moment 
problems  where  the  class  Jt  happens  to  be  convex.  In  Section  2  the  space  X 
can  be  any  locally  compact  Hausdorff  space.  For  {hj,j  e  J}  as  an  arbitrary  col¬ 
lection  (finite  or  infinite)  of  lower  semicontinuous  functions  on  X,  we  establish 
a  condition  which  is  both  necessary  and  sufficient  for  the  existence  of  a  regular 
probability  measure  g  on  X  satisfying  g(hj)  ^  r\j  for  ally  e  J.  However,  we  do 
assume  as  a  side  condition  that  the  h}  dominate  each  other  at  infinity  in  a  certain 
weak  sense.  This  domination  condition  is  void  when  X  is  compact  and  nearly  so 
when  hj  Si  0. 

In  Sections  3  and  4  we  are  interested  in  the  smallest  value  L(y)  of  g(h)  when 
it  is  known  that  g  e  Jt  and  that  g(gj)  =  yj  for  j  =  1,  •  •  •  ,  n ;  the  space  X  can  be 
any  measurable  space.  Provided  this  smallest  value  L(y)  is  in  fact  assumed,  it 
turns  out  that  in  the  determination  of  L(y)  we  only  need  to  consider  so  called 
admissible,  measures. 

These  are  defined  as  the  measures  g  e  Jt  which  attain  the  smallest  possible 
value  g{ij/)  for  some  linear  combination  \j/  of  the  form  =  h  —  dxgx  —  •  •  •  — 
dng„.  In  the  special  case  that  Jt  consists  of  all  probability  measures  on  X,  we 
have  admissibility  if  and  only  if  the  measure  is  carried  by  the  set  of  minima  of 
some  such  linear  combination  if/. 

In  Sections  5  and  6  we  are  interested  in  bounds  for  and  inequalities  between 
the  different  moments  of  a  sum  S„  =  Zx  +  •  •  •  +  Z„  of  independent  random 
variables  Zt.  Here,  the  Z{  may  have  different  distributions  subject  to  certain 
restrictions  on  these  distributions.  The  resulting  collection  Jt  of  possible  distri¬ 
butions  of  Sn  is  usually  not  convex. 

An  essential  use  is  made  of  the  fact  that  each  cumulant  Kj(S„)  of  S„  is  equal  to 
the  sum  of  the  Kj(Z().  The  set  A[g]  of  possible  g-tuples  ( Kt(Z ),  •  •  •  ,  Kq(Z))  is 
usually  not  a  convex  subset  of  Rq.  It  turns  out  that  for  large  n  the  existing 
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inequalities  between  the  different  moments  E {SJ„),j  =  1,  •  •  •  ,  q,  are  more  or  less 
determined  by  the  structure  of  the  convex  hull  of  K[q~\. 

In  Section  5  the  resulting  inequalities  are  worked  out  in  detail  for  the  case 
that  0  ^  Z  5s  1  and  q  ^  4.  Section  6  contains  among  other  things  an  explicit 
method  for  determining  the  best  possible  upper  bound  on  i£(exp  {£$„})  subject 
only  to  the  condition  that  0  ^  Z,  ^  1  ,E(Z{)  =  Cj  for  j  =  1,  •  •  • ,  m,  all*,  while 
E(S{)  =  dj  for  j  =  m  +  1,  •  •  •  ,  q.  Here  q  ^  2m  +  1  and  the  Zt  may  have 
different  distributions. 

2.  A  general  moment  problem 

2.1.  In  the  present  section  we  treat  a  frequently  occurring  moment  problem 
which  may  involve  infinitely  many  side  conditions.  In  the  sequel,  X  denotes  a 
locally  compact  Hausdorff  space  made  into  a  measurable  space  by  the  cr-field 
of  all  Borel  subsets  of  X.  We  shall  often  employ  lower  semicontinuous  functions 
h  :  X  -*  R.  This  means  that  h(x)  ^  lim  infy^x  h(y)  for  all  x  e  X  ;  equivalently, 
the  set  {#:  h(x)  >  c }  is  always  open;  such  a  function  h  is  bounded  below  on 
each  compact  subset  of  X. 

Further  on,  {hj,j  e  J}  will  denote  a  given  finite  or  infinite  collection  of  lower 
semicontinuous  functions  on  X  (such  as  the  characteristic  functions  of  the  open 
subsets  of  X).  This  collection  is  sometimes  denoted  by  Jf.  Next,  {rjj,j  e  J}  de¬ 
notes  a  given  real  valued  function  on  the  index  set  J.  Finally,  will  stand  for 
the  class  of  all  regular  probability  measures  fionX  such  that  each  hj  is  integrable 
relative  to  y  in  such  a  way  that 

(2.1)  fi(hj)  =  J  hj(x)pi{dx)  ^  r\j  for  each  jeJ; 

/i{A)  rjj  if  hj  is  the  characteristic  function  of  the  open  set  A. 

We  shall  be  interested  in  establishing  sufficient  conditions  for  J( *  to  be  non¬ 
empty.  We  may  expect  that  such  a  result  would  enable  us  to  handle  many  other 
moment  problems  simply  by  adjoining  new  functions  hj  to  the  system  J?  and 
adding  new  conditions  of  the  type  (2.1),  (see  for  instance  [5],  p.  569).  Also 
observe  that  (2.1)  allows  us  to  formulate  a  condition  of  the  form 

(2.2)  J  gi{x)n{dx)  =  pf, 

where  gt  is  a  continuous  function  on  X  and  i  may  run  through  an  index  set  /  of  any 
cardinality.  All  one  needs  to  do  is  to  adjoin  both  g{  and  —g{  to  the  system  Jf. 

Let  RJ  denote  the  collection  of  all  real  valued  functions  /?(• )  on  J  such  that 
jj(j)  =  fij  =  0  for  all  but  finitely  many  j  e  J.  Similarly,  let  RJ+  denote  the  col¬ 
lection  of  all  nonnegative  functions  in  R J.  It  is  clear  from  (2.1)  that 

f  [a0  +  Z  PM  dV  =  ao  +  Z  Pj^j’ 

J  j  j 


(2.3) 
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as  soon  as  a 0  €  R,  fl(-)  e  R+  and  g  e  Jt*.  Therefore,  in  order  that  Jt *  be  non¬ 
empty  it  is  at  least  necessary  that 

(2.4)  a0  +  Yu  PA  =  °>  a0  e  R,  p  e  RJ+  =>  a0  +  £  Pjrjj  ^  0. 

j  j 

Here,  if  (j) :  X  -*■  R  then  <f>  ^  0  denotes  that  (j>(x)  ^  0  for  all  x  e  X. 

Remark  2.1.  Actually,  (2.4)  is  already  a  necessary  condition  for  a  much 
weaker  property  than  Jt*  being  nonempty.  Namely,  suppose  instead  that  for 
each  choice  of  the  finite  subset  J'  of  J  and  each  choice  of  the  numbers  Sj  >  0, 
j  e  /',  there  exists  a  probability  measure  g  on  X  such  that 

(2.5)  J  hjdfi  ^  rij  4-  Sj  for  each  j  e /'. 

Obviously  this  implies  (2.4). 

2.2.  The  following  examples  will  show  that  in  fact  condition  (2.4)  is  not 
sufficient  for  Jt *  to  be  nonempty. 

Let  X  =  R  and  let  Jt '*  be  determined  by  the  conditions 

(2.6)  J  x2g(dx)  ^1,  J  e~x2n(dx)  =  1. 

Then  Jt +  is  clearly  empty  though  (2.4)  is  satisfied.  More  precisely,  one  may  take 
in  this  case  /  =  {1,2},  h^(x)  =  1  —  x2  with  rjx  =  0a,ndh2(x)  =  1  —  e~*2with 
rj2  =  0.  That  (2.4)  is  satisfied  follows,  for  instance,  from  Remark  2.1  by  re¬ 
placing  r\2  =  0  by  tj2  =  3  with  S  >  0  arbitrarily  small,  and  observing  that  there 
do  exist  probability  measures  n  on  R  for  which  j  x2  dfi  ^  1  and  J  e  *2  dfi 
1  -  S. 

As  a  second  counterexample,  take  X  as  the  discrete  space  X  =  {1,2,3,  *••}. 
Let  {hj,j  6  /}  and  {rjj,j  e  /}  be  such  that  hj{x)  ^  0  always  while 

(2.7)  y\j  ^  lim  sup  hj(x),  j  e  J, 

X~>  00 

and  further 

(2.8)  inf  {r\JhAx):j  e  /,  hAx)  >  0}  =  0  for  all  x  €  X. 
j 

(For  instance,  one  may  take  hj(x)  =  x~j  and  rjj  =  l/j !,  where  j  =  1,  2,  •  •  •  ;  or 
take  rjj  —  0  =  lim*.^  hj(x)  and  sup^-  hj(x)  >  0;  or  take  hj(x)  =  j  +  j3/x  and 
=i2>  where =  1,  2,  •  •  •  .) 

Condition  (2.4)  is  an  immediate  consequence  of  (2.7).  On  the  other  hand,  let 
j u  be  any  nonnegative  measure  on  X  satisfying  (2.1).  Then  rjj  ^  hj(x)n({x })  for 
all  j  €  /,  x  €  X  and  we  conclude  from  (2.8)  that  fi({oc})  =  0  for  all  x  6  X,  so  that 
/i  cannot  possibly  be  a  probability  measure.  In  other  words,  Jt*  is  empty. 

As  a  last  counterexample,  take  again  X  =  { 1 ,  2,  3,  *  *  * }  and  let  (2.1)  be  of  the 
form  n(gi)  =  0  for  all  i  e  I.  Here,  we  take  X  ->  R  such  that  g^x)  -*  0  as 
x  -*■  oo,  for  all  i  e  I,  so  that  condition  (2.4)  is  trivially  satisfied.  Finally,  suppose 
that  for  each  x  e  X  and  each  £  >  0  there  exists  an  index  i  e  I  such  that  g^x)  >  0 
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and  gi(x')  ^  —  eg^x)  for  all  x'  e  X ;  (it  would  be  sufficient  that  a  single  function 
gt  be  positive  and  nonincreasing  toward  0).  If  p  were  a  probability  measure 
satisfying  p  (grf)  =  0,  we  would  have 

(2.9)  0  =  p(gr,)  ^  -(1  -  n({x}))egi(x)  +  p({x})gi(x), 

implying  that  p  ({#})  ^  e/(l  +  £).  But#  e  X  and  e  >  0  are  arbitrary,  thus,  p  =  0 
so  that  Jt *  is  in  fact  empty. 

Remark  2.2.  We  can  think  of  other  necessary  conditions  besides  (2.4)  for 
to  be  nonempty.  For  instance,  one  would  be 

(2.10)  hj(x)  >  0  for  all  x  e  X  =>  rjj  >  0. 

In  view  of  Fatou’s  lemma,  another  necessary  condition  would  be 

(2.11)  hj(x )  ^  0,  fij  ^  0,  Y  Pjhj(x)  —  oo  for  all  x  =>  £  pjtjj  =  oo, 

j  j 

where  j  is  to  be  restricted  to  some  denumerable  subset  of  J.  In  the  case  of  con¬ 
ditions  of  the  form  p(gf{)  =  pt,i  =  1,  2,  •  •  •  ,  the  dominated  convergence  theorem 
yields  as  a  further  necessary  condition  that  £  =  0  as  soon  as  £  ol^^x)  —  0 

for  all  xeX  and  further  that  £|aigfi(rr)|  =  £  Pffij(x)  for  some  choice  of  the 
numbers  /?.•  ^  0  and  the  functions  hj  ^  0  in  with  £  Pjrj  j  <  00  >  j  being  re¬ 
stricted  to  a  countable  subset  of  J. 

Some  of  the  above  additional  necessary  conditions  are  in  fact  violated  by  the 
counterexamples  outlined  in  this  section.  Nevertheless,  if  possible  we  would 
clearly  prefer  to  avoid  using  conditions  of  the  type  (2.11)  since  they  are  hard  to 
verify. 

One  would  also  like  to  keep  the  system  =  {hj,j  e  J)  as  small  as  possible 
so  that  (2.4)  may  not  be  too  hard  to  verify.  Naturally,  using  the  properties  of  an 
integral  (such  as  Fatou’s  lemma  and  linearity)  one  can  usually  enlarge  con¬ 
siderably  without  affecting  the  class  ,  but  we  will  refrain  from  doing  this. 

Definition  2.1.  If  h  and  (j>  are  real  valued  functions  on  the  locally  compact 
space  X,  we  will  say  that  h  is  dominated  below  at  infinity  by  <p  when,  for  each 
number  e  >  0,  there  exists  a  compact  set  Ks  a  X  such  that 

(2.12)  h(x)  ^  —s\(f)(x)\  foreachx$Ks. 

Observe  that  it  would  be  sufficient  that  h  be  nonnegative  or  that  X  itself  be  compact. 

If  (2.12)  is  replaced  by  h(x)  ^  e|$(a;)|  for  each  x  $  Ke,  we  will  say  that  h  is 
dominated  above  at  infinity  by  (j).  If  both  properties  hold  we  say  that  h  is 
dominated  at  infinity  by  (f>. 

Definition  2.2.  Let  O  denote  the  class  of  all  nonnegative  functions  (f>:X  -*  R 
which  admit  at  least  one  representation  as 

(2.13)  (f>(x)  =  a0  +  Y  xeX, 

j 

with  a 0  e  R,  f}(  - )  e  R+ .  Observe  that  each  <f>  e  <X>  is  lower  semicontinuous  and  also 
that  O  is  a  convex  cone  in  the  obvious  sense. 
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Only  for  the  purpose  of  a  proof,  we  further  associate  to  each  $  e  0  a  non¬ 
negative  number  q((f>)  such  that  q(4>)  ^  a0  +  Z,  ^ rjj  for  at  least  one  repre¬ 
sentation  of  the  form  (2.13).  Observe  that,  by  (2.1),  we  have  g((j))  ^  q{(f>)  for 
each  g  e  dt * . 

In  the  sequel  we  shall  make  a  frequent  use  of  the  function  g0  on  X  defined  by 
g0(x)  =  1  for  all  x  e  X.  Therefore,  g{g0)  =  1  for  each  g  e  dt *. 

Theorem  2.1.  Suppose  that,  for  eachj  e  J,  the  function  hj  is  dominated  below 
at  infinity  by  some  ^eO;  it  would  be  sufficient  that  hj  ^  0.  Suppose  further  that 
the  special  function  g0  is  dominated  at  infinity  by  some  (f>0  e  $,  and  let  dt*  denote 
the  collection  of  all  regular  probability  measures  on  X  satisfying  (2.1).  Then 

(i)  the  collection  dt *  is  nonempty  if  and  only  if  condition  (2.4)  holds ; 

(ii)  dl +  is  a  convex  set  which  is  compact  in  the  weak*  topology. 

Here,  as  usual,  the  weak*  topology  is  taken  relative  to  the  class  Jf  (X)  of  all 
continuous  functions  /  on  X  having  a  compact  support  (that  is,  {x  :f(x)  =/=  0} 
has  a  compact  closure).  Thus  a  net  {ghiel}  of  regular  Borel  measures  con¬ 
verges  to  g  if  and  only  if  /x,(/)  /*(/)  f°r  all/e  df(X).  Concrete  applications 

of  Theorem  2.1  may  be  found  in  [5]. 

For  the  moment,  consider  a  pair  hj,  (f)j  as  in  the  theorem.  Since  (f>j  ^  0  it  is 
integrable  relative  to  a  nonnegative  finite  measure  g  as  soon  as  g(<t>j)  <  oo.  We 
claim  that  this  implies  that  hj  is  at  least  improperly  integrable  so  that  g(hf)  is 
well  defined.  After  all,  hj(x)  ^  —S(f)j(x)  for  x  outside  some  compact  set  KE, 
while  on  Ke  the  lower  semicontinuous  function  hj  is  bounded  below. 

Assertion  (i)  of  Theorem  2.1  was  already  established  in  [5]  pp.  565  and  570. 
(Apply  Theorem  4.1  of  [5]  with  F  as  the  linear  manifold  spanned  by  g0  and  the 
hj,  j  e  J,  and  take  F+  as  the  convex  cone  of  all  f  eF  such  that /  ^  ^  for  some 
(upper  semicontinuous)  function  ij/  of  the  form  ip  =  g(</>)gr0  —  <!>'■>  here  0  g  O, 
while  the  scalars  q{f>)  are  chosen  such  that  g  e  dt \  if  and  only  if  g(g0)  —  1  and 
g{\jj)  ^  0  for  all  such  \j/ ;  the  present  condition  (2.4)  corresponds  to  the  con¬ 
dition  — g0  $  F+  of  [5];  condition  (4.1)  of  [5]  appears  unnecessary.) 

The  proof  in  [5]  used  the  classical  Hahn-Banach  theorem  together  with  the 
Riesz  representation  theorem.  The  proof  below  of  (i)  and  (ii)  relies  more  heavily 
on  the  linear  space  d({X)  of  all  real  valued  finite  signed  regular  measures  on  X, 
made  into  a  locally  convex  topological  vector  space  by  means  of  the  weak* 
topology.  Further, 

(2.14)  B(X)  =  {iteJHXy.it  2  0,  ||/j||  g  1}, 

will  denote  the  set  of  all  nonnegative  measures  g  e  dt{X)  of  total  mass  ^  1 . 
It  is  essential  for  the  proof  that  B(X )  is  not  only  convex  but  also  compact  (in  the 
weak*  topology). 

Proof  of  Theorem  2.1.  In  the  sequel,  we  shall  assume  all  the  conditions 
of  Theorem  2.1  and  further  condition  (2.4),  since,  otherwise,  dt *  would  be 
empty. 

For  each  finite  subset  J'  of  J,  let  dt*  (/' )  denote  the  set  of  all  ge  B{X) 
satisfying 

(2.15)  g(g0)  =  1,  g{hj)  ^  rjj  if  j  e  J', 
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and 

(2.16)  fi(<f>0)  ^  q{<f> 0),  n(<t>j)  ^  q{(f)j)  if  j  e  J' . 

We  easily  verify  that  the  class  is  precisely  equal  to  the  intersection  of  the 
collection  of  all  such  classes  Moreover,  this  collection  has  the  finite 

intersection  property,  since  M *(/' )  n  M *(/" )  =  u  J").  Hence,  in  order 

to  prove  the  theorem,  that  is,  in  order  to  prove  that  is  nonempty  and  compact 
it  suffices  to  prove  that  each  individual  class  is  nonempty  and  compact. 

From  now  on,  let  J'  be  fixed.  For  convenience,  we  shall  take  J'  =  {1, 2,  ■  •  • ,  n). 
Using  condition  (2.4)  and  the  definitions  of  <t>  and  q((f>),  we  easily  see  that  for 
any  choice  of  the  real  constants  a,  fij  ^  0,j  =  1,  •  •  • ,  n,  and  jj  ^  0,j  —  0, 1,  •  •  • ,  n, 
we  have 

(2.17)  a  +  Z  PjVj  +  Z  ^  0, 

1  o 

as  soon  as 

(2.18)  <*go(x)  +  Z  Pjhj{%)  +  Z  yj^M)  =  0  f°r  x  6  % 

1  o 

Introducing  (p„+j  =  hj,j  —  1,  •  •  •  ,  n,  and 

(2.19)  tj  =  q((f>j),  j  =  0,  1,  •  •  •  ,  »,  Cn+j  —  *lj>  ;  =  1, 
this  implication  can  be  restated  as 

(2.20)  a  +  Z  y^j  ^  °>  yj  = 0  =* a  +  Z  yjCj  ^  o. 

j—0  j=0 

Here,  the  functions  <f>j,  j  =  0,  1,  *  •  •  ,  2 n,  are  all  lower  semicontinuous.  More¬ 
over,  (j>j  ^  0,  j  =  0,  1,  •  •  •  ,  n;  thus  C/  ^  0,  j  =  0,  •  •  •  ,  n.  Moreover,  g0  =  1  is 
dominated  at  infinity  by  (j)0 ,  while  (f>n  +  j  is  dominated  below  at  infinity  by 
< pj,j  —  1,  •  •  •  ,  n.  Finally,  (/' )  can  now  also  be  described  as  the  col¬ 

lection  of  all  fieB(X)  such  that 

(2.21)  fi{g0)  =  1,  ^  Cj,  j  =  0,  1,  •  •  •  ,  2 n. 

We  shall  first  prove  the  following  three  results : 

(i)  for  j  =  0,  1,  •  *  •  ,  n,  the  functions  fi  ->  fi{(f>j)  are  lower  semicontinuous  on 
B(X) ;  in  particular  we  have  that  the  set  {(i  e  B(X) :  fi((f)j)  S  c}  is  always  closed, 
j  =  0,  1,  •  •  ,  n; 

(ii)  on  the  set  ^4(c)  =  {ft  eB{X):  fi{(t>0)  ^  c}  (with  c  as  a  finite  constant)  the 
function  fi  ->  fi{g0)  is  continuous;  thus  {fi  €  B(X):  n{4>o)  ^  c,  fi{g0)  =  1}  is  a 
closed  set ; 

(iii)  let  1  ^  j  ^  n ;  then  on  the  set 

(2.22)  Bj(c)  =  {neB(X):  n{g0)  =  1  ^  c}, 

(with  c  as  a  finite  constant)  the  function  fi  ->  fi{(f>n+j)  =  (p{hj)  is  lower  semi¬ 
continuous. 
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One  easily  verifies  that  (i),  (ii),  (iii)  together  imply  that  the  set  M'%  defined  by 
(2.21)  is  in  fact  a  closed  subset  of  B(X)  and  therefore  compact. 

To  prove  (i)  let  0  ^  j  ^  n  be  fixed.  Then  <f)j  is  nonnegative  and  lower  semi- 
continuous,  in  which  case 

(2.23)  Mj)  =  sup  {/!(/):/€  Jf  (X),  0  £  /  £  </>,}, 

holds  for  each  /x  e  B(X)  (see  [1],  p.  104).  Here,  by  the  definition  of  the  weak* 
topology  each  function  /x  — >  fi(f)  is  a  continuous  function  of  [i.  Hence,  it  follows 
from  (2.23)  that  the  function  /x  -*•  / u(<f>j)  is  lower  semicontinuous  on  B(X). 

For  the  proof  of  (ii),  we  use  that  g0  is  dominated  at  infinity  by  (f>0  ^  0. 
Let  £  >  0  and  choose  the  compact  set  Kt  such  that  1  =  |gr0 (xr) j  ^  E(f> 0(x)  for 
each  x  $Kt.  Next  choose  \J/e  in  X~(X)  such  that  i pE(x)  —  1  =  g 0(x)  for  x  eKe 
and  0  ^  ^E{x)  ^  1,  otherwise.  It  follows  that 

(2.24)  \n(g0)  -  n(\l/e)\  ^  j^dfi  ^  e  j  (j>0{x)  d\L  ^  £c, 

as  long  as  g,  e  ^4(c).  Hence,  on  ^4(c)  the  function  /i  -*  fi(g0)  is  the  uniform  limit  of 
the  continuous  functions  /x  ->  j u(\f/E)  and  therefore  itself  continuous. 

To  prove  (iii),  let  1  ^  j  ^  n  be  fixed.  We  know  that  the  function  (f>„+j  = 
on  X  is  dominated  below  at  infinity  by  the  nonnegative  function  (pj.  Hence,  for 
each  e  >  0  there  exists  a  compact  set  Kt  in  X  such  that 

(2.25)  hj(x)  +  E(f)j{x)  ^  0  for  x$Ke. 

Here,  the  left  side  defines  a  lower  semicontinuous  function  which  therefore  is 
bounded  below  on  Ke.  Consequently,  there  exists  a  constant  ae  such  that  acg0  + 
hj  +  E<f>j  ^  0  everywhere.  In  view  of  a  relation  of  the  type  (2.23),  the  function 
[i  [J.(aEg0  +  hj  +  £<^j)  is  lower  semicontinuous  throughout  B(X).  Hence,  on 
Bj(c)  the  function 

(2.26)  n(hj  +  G<f> j)  =  n(aeg0  +  h}  -I-  e<f>j)  -  ae 
is  lower  semicontinuous.  But  on  Bj{c)  we  also  have 

(2.27)  | H(hj)  -  n(hj  +  £<f) j) |  ^  Gn((j)j)  ^  sc. 

Therefore,  on  Bj(c)  the  function  n  -*■  n(hj)  is  the  uniform  limit  of  lower  semi¬ 
continuous  functions  and  thus  itself  lower  semicontinuous. 

It  remains  to  prove  that  the  compact  set  Jt *  is  nonempty.  Let  D  denote  the 
set  of  all  points  w  e  R2n+1  such  that  there  exists  a  measure  /x  €  B(X)  satisfying 

(2.28)  n(g0)  =  1,  n(<f>j)  g Wj ,  j  =  0,  1,  •  •  •  ,  2 n. 

Clearly  the  set  D  is  convex.  Using  the  above  results  (i),  (ii),  (iii)  and  the  fact 
that  B(X)  itself  is  compact,  we  easily  see  that  the  set  D  is  also  closed.  Thus,  D  is 
equal  to  an  intersection  of  a  collection  of  closed  half  spaces. 

It  is  given  that  (2.20)  holds  and  we  must  prove  that  the  set  M. *  defined  by 
(2.21)  is  nonempty.  In  other  words,  we  must  prove  that  (2.20)  implies  z  e  D, 
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where  z  =  (Co  >  Ci ,  '  ’ ' ,  C2n)-  It  suffices  to  prove  that  z  e  H,  where  H  is  any  closed 
half  space  containing  D.  Let  this  half  space  have  the  form 

f  2" 

(2.29)  H  =  \w.  a  +  £  ^  0 

l  j= o 

where  a  and  denote  real  constants.  Since  D  a  H  and  D  is  unbounded  in  each 
positive  Wj  direction  we  must  have  that  ^  0. 

Considering  a  probability  measure  ex  on  X  supported  by  a  single  point  x  e  X, 
we  see  that  Wj  =  4>j(x),j  =  0,  1,  •  •  •  ,  2 n,  defines  a  point  w  e  D  a  H  \  hence,  we 
have  that 

In 

(2.30)  a  4-  Y,  yj^ji30)  =  0  for  all  x  eX. 

j=  o 

Invoking  (2.20),  we  conclude  that  a  +  y,Cj  ^  0,  that  is,  ze  H.  This  com¬ 
pletes  the  proof  of  Theorem  2.1. 

Remark  2.3.  In  Theorem  2.1  the  condition  (2.4)  can  also  be  replaced  by 


(2.31) 

a0  +  Z  PA  >  °> 

r 

a0  e  R, 

PeRJ+ 

=>  a0  +  Z  Pjlj  >  °> 

j 

or  by 

J 

(2.32) 

a0  +  Z  PA  >  °> 

a0  6  R, 

peBi 

=>  «o  +  Z  Pflj  =  °- 

j  j 


After  all,  given  the  other  (domination  type)  conditions  of  the  theorem,  we  have 
the  implications 

(2.33)  (2.4)  =>  nonempty  =>  (2.31)  =>  (2.32)  =>  (2.4). 

Here,  the  first  implication  follows  from  Theorem  2.1,  while  the  others  are  more 
or  less  obvious.  For  an  important  special  case  (with  X  compact)  the  equivalence 
between  Ji*  0  and  (2.31)  is  due  to  Ky  Fan  ([3],  p.  68). 

The  following  variation  of  Theorem  2.1  is  often  useful  in  applications. 
Theorem  2.2.  As  in  Theorem  2.1,  assume  that  gr0  —  1  dominated  at  infinity 
by  some  (f>0  in  O  and  further  that  each  hj,  j  e  J,  is  dominated  below  at  infinity  by 
some  (j)j  in  O.  Assume  also  that  is  nonempty ;  equivalently,  assume  that  (2.4) 
holds. 

Next,  let  f  be  a  fixed  upper  semicontinuous  function  on  X  which  is  dominated 
above  at  infinity  by  some  (f)  in  O,  and  define 

(2.34)  q(f)  =  inf  {a  +  £  fijUj:  a  +  £  PA  ^  /}*» 

here,  a  ranges  through  R  while  /?(•)  ranges  through  R+ . 

Clearly,  M{f)  ^  q(f)  <  oo,  where 

(2.35)  M{f)  =  sup  {y(f):  pe  Jt*}. 

We  assert  that  in  fact  M(f)  =  q(f)  and  further  that  the  supremum  in  (2.35)  is 
assumed.  In  fact,  the  set  of  p  e  satisfying  p(f)  =  M  (/)  is  nonempty,  convex 
and  compact  (in  the  weak*  topology). 
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Proof.  Let  /  be  dominated  at  infinity  by  $  6  <D;  in  particular  f  ^<t>  for 
some  (f)  e  <I>,  thus,  q(f)  <  oo.  Put  h  —  —f  so  that  h  is  lower  semicontinuous  and 
dominated  below  by  the  function  $  in  0.  Adjoining  h  to  =  {hj,j  e  /},  one 
obtains  a  new  class 

(2.36)  M^y\  =  {fieJ(*\  n(h)  g  -y}  =  {yeJt*:  fi(f)  ^  y}, 

which  depends  on  the  choice  of  y.  It  follows  from  Theorem  2.1  that  is 

always  compact  and  convex.  From  (2.35),  it  is  empty  when  y  >  M(f)  and  non¬ 
empty  when  y  <  M(f).  Letting  y\M(f),  we  conclude  that  {ye  \  y(f)  = 
M(f )}  is  a  nonempty  compact  and  convex  set. 

Theorem  2.1  also  supplies  a  necessary  and  sufficient  condition  for  M^y)  to 
be  nonempty,  namely,  condition  (2.4)  applied  to  u  {A}  instead  of  It  turns 
out  that  M^y)  is  nonempty  if  and  only  if  y  ^  q{f).  Consequently,  we  have 
M(f)  =  q(f). 

Remark  2.4.  The  condition  of  Theorem  2.2  that  /  be  dominated  at  infinity 
by  some  </>£<!>  cannot  be  omitted.  For  instance,  take  X  =  [1,  +  oo)  and 
{hj,jeJ}  as  the  single  function  h (x)  —  x2  +  x~2.  Thus  consists  of  all 
y  e  B(X)  with  y(g0)  =  1  and  fi(h)  ^  rj.  Finally,  let  f(x)  =  x2.  Clearly,  M(f)  = 
q(f)  =  rj,  but,  nevertheless,  we  have  fi(f)  <  q  for  all  g  e  M 

Remark  2.5.  The  assertion  M  (/)  =  q(f)  of  Theorem  2.2  is  obviously  related 
to  the  so  called  fundamental  theorem  of  linear  programming  (see  [5],  pp.  558, 
561).  Of  special  interest  would  be  the  case  where  not  only  the  supremum  M(f) 
is  attained  by  at  least  one  measure  fi0  e  ,  but  where  also  the  infimum  q(f)  is 
attained  by  a  pair  a  e  R,  /?( • )  e  RJ+  .  This  situation  will  be  taken  up  in  Section  3. 
We  easily  verify  that  the  measure  fi0  must  be  carried  by  the  measurable  set  S  of 
points  x  g  X  for  which  a  +  Z  Pjhj(x)  =  f(x);  moreover,  fi0(hj)  =  rjj  whenever 
Pj  >  0.  Conversely,  every  fiQ  g  with  these  properties  does  attain  M(f). 


3.  Admissible  measures 

In  this  section,  X  denotes  an  arbitrary  measurable  space  and  J(  a  given  non¬ 
empty  convex  collection  of  (nonnegative)  measures  on  X.  We  shall  be  interested 
in  the  best  lower  bound 

(3.1)  L(y)  =  L{y\h)  =  inf  {fi(h):  fi  E  M,  y{g)  =  y). 

Here,  g  =  (gt,  •  •  •  ,  gn)  is  a  given  measurable  function  g :  X  ->  JR"  which  is  inte- 
grable  relative  to  each  fi  e  Ji.  Further,  h  :  X  ->  R  denotes  a  given  measurable 
function  which  is  integrable  relative  to  each  fi  e  M .  Finally,  y  =  (yx,  •  •  • ,  yn) 
denotes  a  variable  point  in  R". 

Clearly,  L(y)  <  +  oo  if  and  only  if  y  belongs  to  the  so  called  moment  space 

(3.2)  M  =  {y  e  Rn:  g{g)  =  y  for  some  fi  e  J(). 

Here,  M  is  a  convex  subset  of  R”  since  the  collection  M  was  assumed  to  be 
convex.  We  may  (and  shall)  assume  that  M  has  a  nonempty  interior  ;  for,  other¬ 
wise  the  components  gq  of  g  would  be  linearly  dependent  as  far  as  the  measures 
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yeJt  are  concerned  so  that  part  of  the  information  n(g)  =  y  would  be 
redundant. 

The  function  L(y)  is  clearly  convex.  Excluding  the  situation  that  L(y)  =  —  oo 
for  all  y  e  int  {M),  it  follows  that  L(y)  is  finite  everywhere  on  M  and  even  con¬ 
tinuous  throughout  int  ( M ). 

For  each  measurable  function  i p  :X  — ►  R,  let  us  introduce 

(3.3)  ymin{\p  =  ini{y{\p):  ye  Ji}\ 

put  ymin{ip)  =  —oo  if  ip  is  not  even  improperly  integrable  relative  to  some 
y  e  M .  If  ymm(ip)  is  finite,  then  a  measure  yonX  will  be  said  to  be  critical  relative 
to  ip  if 

(3.4)  yeJt ,  y(ip)  =  ymin(\l/). 

Example  3.1.  Let  Jt  consist  of  all  probability  measures  on  X.  In  this  case 

(3.5)  PmM)  =  inf  ip  =  inf  {ip(x) :  x  e  X}. 

Moreover,  y0  e  J(  is  critical  relative  to  i p  if  and  only  if  it  is  carried  by  the 
“contact”  set 

(3.6)  S{\p)  =  {xeX:\p(x)  =  ini\p}. 

Example  3.2.  Let  A  be  a  fixed  measure  on  X  and  let  0  ^  a(x)  ^  b(x)  be 
given  measurable  functions  on  X  which  are  integrable  relative  to  A.  Finally,  let 
Jf  consist  of  all  measures  on  X  of  the  form 

(3.7)  y{A)  =  J  p(x)X{dx),  a(x)  ^  p(x)  ^  b{x). 

This  measure  y  will  be  critical  relative  to  a  function  ip  if  and  only  if  the  corres¬ 
ponding  function  p  (x)  is  such  that  p(x)  =  a  (a;)  for  almost  [A]  all  x  with  ip\x)  >  0 
and  further  p(x)  =  b(x)  for  almost  [A]  all  x  with  ip(x)  <  0.  Here,  we  are 
assuming  that  J  \  ip\b  dX  <  oo. 

Example  3.3.  Let  J(  consist  of  all  measures  on  X  of  the  form 

(3.8)  y(A)  =  j  P{u,  A)v(du). 

Here,  v  denotes  an  arbitrary  probability  measure  on  a  fixed  measurable  space 
U,  while  P  is  a  given  Markov  kernel  function  of  u  e  U,  A  c  X.  If  ip  ^  0  is  measur¬ 
able  then  y(ip)  =  v(\p),  where 

(3.9)  \j/(u)  =  J  i p(x)P(u,  dx). 

Thus,  y  is  critical  relative  to  \p  if  and  only  if  the  corresponding  measure  v  is 
carried  by  the  contact  set  $($)  <=  U. 

Lemma  3.1 .  Consider  any  function  (a  so  called  polynomial )  of  the  special  form 

n 

ip(x)  =  h(x)  -  X  djgjix), 


(3.10) 
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where  the  dj  denote  real  constants.  Then 

(3.11)  L(z)  ^  pm in(^)  +  £  djZj  for  all  z  e  M. 

i=  i 

Moreover,  given  y  g  int  (M ),  one  can  always  choose  this  polynomial  ij/  in  such  a 
way  that 

(3-12)  L{y)  =  pmM)  +  £  djyj; 

j= 1 

here,  \\t  is  unique  for  almost  all  y  g  int  (M ). 

Definition  3.1.  A  measure  p0  on  X  will  be  said  to  be  admissible  if  p0  e  Jt 
and  further  p0  is  critical  relative  to  some  polynomial  i p  of  the  form  (3.10). 

For  instance,  in  Example  3.1  a  probability  measure  p0  is  admissible  if  and 
only  if  it  is  supported  by  the  contact  set  S  {\j/)  of  some  polynomial  \J/. 

Theorem  3.1.  Each  admissible  measure  p0  assumes  L(y)  in  the  sense  that 

(3.13)  L(y)  =  p0(h),  where  y  =  p0(g). 

Conversely,  if  y  g  int  (M)  and  L(y)  is  assumed  by  p0  g  Jt,  then  p0  is  admissible 
[and  the  corresponding  polynomial  i J/  is  unique  for  almost  all  y  e  int  (M)). 

Consequently,  if  L(y)  is  assumed  for  all  y  g  int  ( M ),  then  (3.13)  with  p0  running 
through  all  admissible  measures  will  yield  a  parametric  representation  of  the 
function  L(y)  at  least  for  y  G  int  ( M ). 

Proof  of  Lemma  3.1.  (Another  proof  is  given  in  [5],  p.  574.)  Consider  a 
measure  p  g  Jt  with  p(g)  =  z.  Integrating  (3.10),  we  find  that 

(3.14)  p{h)  =  ptf)  +  £  djZj  ^  pmin(ifs)  +  X  djZj. 

j=i  j= i 

This  implies  (3.11).  Considering  p  G  M  with  p(\j/)  close  to  pmin{^)  and  then 
taking  z  =  p(g),  we  see  that  the  constant  term  pmin{ij/)  in  (3.11)  cannot  be 
improved.  That  is,  y  =  pmin(tf/)  +  d1z1  +  •  •  •  +  d„zn  is  the  best  supporting 
hyperplane  in  the  direction  (dl,  •  •  •  ,  d„)  to  the  convex  set  Q  in  Rn+1  consisting 
of  all  points  (z,  y)  with  z  g  M  and  y  ^  L(z). 

Conversely,  consider  a  fixed  point  y  g  int  ( M ).  Then  through  the  boundary 
point  (y,  L(y))  of  Q  there  passes  a  supporting  hyperplane  to  Q.  Since  y  e  int  ( M ), 
this  hyperplane  is  non  vertical  and  of  the  form  y  =  d0  4-  £"=1  djZj.  That  is, 
L(z)  ^  d0  +  Z"=1  djZj  for  all  z  g  M,  while  L(y)  =  d0  +  2"  =  1  djyj.  It  follows 
from  the  above  remarks  that  necessarily  d0  =  ^min(^),  where  ij/  denotes  the 
polynomial  defined  by  (3.10).  This  yields  assertion  (3.12). 

The  uniqueness  of  \f/,  for  almost  all  y  G  int  (M),  follows  from  the  well-known 
uniqueness  of  a  supporting  hyperplane  through  the  boundary  point  ( y ,  L{y ))  of 
the  convex  body  Q,  again  for  almost  all  boundary  points,  that  is,  for  almost  all 
y  g  int  ( M ). 

Proof  of  Theorem  3.1.  Let  p0  e  Jt  be  admissible,  thus,  ju0(^)  =  pmin{\l/) 
for  some  polynomial  ^  of  the  form  (3.10).  Letting  p0(g)  =  y,  we  have 

(3.15)  L{y)  g  p0(h)  =  p0{\l/)  +  £  d^j  =  /imin(^)  +  £  djVj. 

j=  i  J=  i 
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In  view  of  (3.11),  the  equality  sign  must  hold  here,  proving  the  first  assertion. 
From  Lemma  3.1,  we  have  for  almost  all  y  that  this  can  happen  for  at  most  one 
polynomial  if/. 

Conversely,  let  y  e  int  (M)  be  fixed  and  suppose  that  L(y)  is  assumed  by 
y0  e  Jt\  thus,  p0{g)  =  y  and  p0(h)  =  L{y).  Next,  choose  the  polynomial  \]/  in 
such  a  way  that  (3.12)  holds.  Then 

n  n 

(3.16)  HoW)  =  y0 (h)  -  £  djfi0(gj)  =  L(y)  -  £  djpj  =  ^min(»A). 

j=  i  j'=i 

This  shows  that  fi0  is  critical  for  \j/  so  that  fi0  is  admissible. 

In  view  of  Theorem  3.1,  we  would  like  to  have  applicable  sufficient  conditions 
on  Jt,  g  =  (gq,  •  •  •  ,  gn)  and  h  in  order  that  the  infimum  L(y)  be  assumed.  In 
Theorem  3.2,  we  take  Jt  as  the  collection  Jt*  described  in  Theorem  2.1.  Thus, 
adopting  the  notations  and  assumptions  of  Theorem  2.1,  Jt*  is  the  class  of 
probability  measures  on  the  locally  compact  space  X  satisfying  (2.1),  with 

=  {hj,j  e  Jj  as  a  system  of  lower  semicontinuous  functions  on  X  satisfying 
a  certain  domination  type  of  condition  (which  is  void  when  X  is  compact). 

Theorem  3.2.  Let  Jt  =  Jt *  be  as  in  Theorem  2. 1.  Let  further  gvj  =  1 ,  •  *  • , «, 
be  given  continuous  functions  on  X  each  dominated  at  infinity  by  some  </>  e  O  and 
let  h:  X  ->  R  be  a  lower  semicontinuous  function  which  is  dominated  below  at 
infinity  by  some 

Define  L(y)  as  in  (3.1)  and  M  as  in  (3.2).  We  assert  that  L{y)  is  assumed  for 
each  y  e  M  in  the  sense  that  for  each  y  e  M  there  exists  y0  e  Jt *  with  y0(g)  =  y 
and  y0{h)  =  L(y). 

Proof.  Simply  adjoin  the  (lower  semicontinuous)  functions  gj  and  —gj  to 
the  given  system  =  {hjJeJ}  and  take  +yj  and  -yt  as  the  corresponding 
rj  values.  Now  the  assertion  that  L{y)  is  assumed  immediately  follows  from 
Theorem  2.2  applied  to  this  enlarged  system  and  with  /  —  —h. 

The  following  result  follows  directly  from  Theorem  3.2  by  observing  that  for 
a  compact  space  X  all  domination  conditions  are  void.  On  the  other  hand,  it 
would  not  be  very  hard  to  prove  Corollary  3.1  directly,  namely,  by  using  the 
simple  result  (i)  used  in  the  proof  of  Theorem  2.1. 

Corollary  3.1.  Let  X  be  a  compact  space  and  {hj,j  e  J]  any  collection  of 
lower  semicontinuous  functions  on  X.  Take  Jt  as  the  collection  of  all  regular 
probability  measures  p  on  X  with 

(3.17)  L  hj(x)p{dx)  ^  rij  for  each  jeJ. 

Here ,  the  rjj  denote  given  real  numbers.  We  assume  that  Jt  is  nonempty.  Finally, 
let  Pj/  X  — ►  R  be  continuous,  j  =  1,  •  •  •  ,  n,  and  let  h:  X  — ►  R  be  lower  semi¬ 
continuous. 

Under  these  assumptions  the  infimum  L(y)  in  (3.1)  is  assumed  for  each  y  e  M 
so  that  Theorem  3.1  becomes  applicable. 
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4.  Applications 

4.1.  In  the  present  section  we  outline  just  one  set  of  applications  of  the 
results  in  Section  3.  We  shall  take  the  measurable  space  X  as  a  compact  space  and 
M  simply  as  the  collection  of  all  regular  probability  measures  on  X.  We  further 
assume  that  the  functions  gt,  •  •  •  ,  gn  are  continuous,  while  for  the  moment  we 
allow  for  h  any  measurable  function  h:  X  ->  R  which  is  bounded  below. 
Consider  the  finite  valued  function 

(4.1)  h(x )  =  liminf/t(y); 

y-*x 

thus,  h(x)  ^  h(x).  In  fact,  h  is  precisely  the  largest  lower  semicontinuous 
function  satisfying  h  ^  h.  We  assert  that 

(4.2)  L{y  |  h )  =  L[y  \  h)  for  each  y  e  int  (M ). 


One  way  of  seeing  this  would  be  to  apply  Lemma  3.1.  Using  (3.5),  this  yields 
that 


(4.3) 


n 

L(y\h)  =  sup  Y  diVj  +  inf h(x)  -  £  d#j{x) 


I; 


U=i 


i 


j=  1 


for  each  y  e  int  ( M ).  Here,  d  runs  through  all  w-tuples  d  =  (dly  ■  •  •  ,  dn)  e  Rn. 
Now  observe  that,  since  the  g}  are  continuous,  the  infimum  in  (4.3)  remains  un¬ 
changed  when  h  is  replaced  by  h,  hence,  (4.2)  obtains. 

As  a  more  intuitive  proof,  let  e  >  0  be  given  and  choose  the  neighborhood 
U  of  ye  int  (M)  such  that  L{z\h )  >  L{y\h)  —  e  for  all  zeU.  Next,  choose 
y0e  J(  such  that  y0{g)  =  y  and  y0(h)  <  L(y\h)  +  e.  We  may  assume  (see 
[6],  p.  95)  that  y0  has  a  finite  support  (consisting  of  at  most  n  +  2  points).  By 
a  slight  movement  of  these  support  points  we  obtain  a  probability  measure  yx 
such  that  yx{h)  <  y0{h)  +  s  <  L(y\h)  +  2e,  while  z  =  /Mgr)  still  satisfies 
zeU.  We  conclude  that 

(4.4)  L(y \h)  —  e  <  L(z\h)  <  L(y\h)  +  2e. 

This  in  turn  yields  (4.2). 

4.2.  From  now  on,  we  shall  assume  that  h  itself  is  lower  semicontinuous. 
As  indicated  by  equation  (4.2),  this  is  no  real  loss  of  generality  (when  we  want 
to  compute  L(y)  =  L[y  |  h )  for  y  e  int  (M ) ;  if  y  e  M  is  on  the  boundary  of  M,  we 
should  replace  X  by  an  appropriate  compact  subset  so  as  to  get  back  at  the 
situation  y  e  int  ( M)\  this  can  always  be  done,  (see  [6],  p.  102)). 

Knowing  that  h  is  lower  semicontinuous,  we  have  from  Corollary  3.1  (or 
from  an  easy  direct  proof)  that  L(y)  is  assumed  for  each  y  e  M.  From  now  on, 
in  this  section,  let  us  restrict  y  to  int  (M ). 

Then  we  conclude  from  Theorem  3.2  that  the  computation  of  L(y)  can  be 
reduced  to  a  study  of  admissible  measures. 
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In  the  present  case,  we  have  from  (3.5)  and  (3.6)  that  an  admissible  measure  is 
any  probability  measure  supported  by  the  contact  set  S(\j/)  of  some  “poly¬ 
nomial”  of  the  form 

(4.5)  i Hx)  =  h(x)  -  £  djgj(x). 

j=  i 

Letting  d0  —  inf  if/,  we  have 

(4.6)  8(ip)  =  \x  e  X  :  d0  +  £  djg^x)  =  h{x) 

i  j= i 

Here,  d0  is  such  that 

(4.7)  d0  +  E  dj9j(x)  =  Hx)  f°r  all  x  €  -X, 

j=  i 

and  d0  is  maximal.  Since  ij/  is  lower  semicontinuous  on  the  compact  space  X 
this  contact  set  S{\l/)  is  always  compact  and  nonempty.  We  now  conclude  from 
Theorem  3.1  that : 

(i)  all  one  needs  to  do  in  computing  L(y)  —  L{y\ti)  for  given  y  is  to  select 
the  admissible  measure  y0  in  such  a  way  that  fi0(g)  —  y\  afterwards,  L(y)  = 

(ii)  this  can  always  be  done,  that  is,  y0  can  always  be  found; 

(iii)  call  a  polynomial  ij/  associated  to  y  when  S(\J/)  carries  a  probability 
measure  fi0  with  y0{g)  —  y.  Such  an  associated  polynomial  always  exists  and 
almost  all  y  have  exactly  one  associated  polynomial.  Finally,  if  \j/  —  h  —  E"  _  x  djgj 
is  associated  to  y  then  L(y)  is  also  given  by 

n 

(4.8)  L{y)  =  d0  +  X  djVv  do  =  inf  •A- 

j=  i 

The  reader  may  enjoy  using  this  principle  in  solving  the  following  problem. 
Namely,  let  Z  be  a  real  random  variable  with  0  ^  Z  ^  1  and  the  first  three 
moments  E(Zj)  =  yj,j  =  1,  2,  3,  given.  Let  further  0<a</?<lbe  given 
numbers.  Now  determine  the  best  possible  upper  and  lower  bounds  on 
Pr(a.  <  Z  <  fi).  For  instance,  as  to  the  lower  bound  L(y),  either  L(y)  —  0  or 
the  admissible  measure  corresponding  to  y  has  one  of  the  supports  {a,  /?,  1}, 

{0,  a,  £2,  p},  {a,  u,  1},  {a,  v,  fi},  {0,  w,  jS}.  Here,  ^  and  £2  are  fixed  numbers, 
while  u,  v,  w  are  variable  such  that  a  <  u  <  ^  <v<<^2,  £2<w</1. 

More  or  less  the  same  result  holds  when  E(gj{Z))  =  yj,  j  =  1,2,3,  and 
foo,  0i >  02 >  £3}  is  a  Chebyshev  system  on  [0,  1],  g0  =  1. 

In  this  and  other  applications,  the  main  advantage  of  the  present  approach 
comes  from  the  fact  that  often  a  set  S(\j/)  of  the  type  (4.6),  (4.7)  must  be  quite 
small  in  some  sense.  This  happens  for  instance  when  X  is  an  analytic  manifold 
and  the  gq  and  h  are  analytic  or  piecewise  analytic.  These  aspects  and  further 
applications  will  be  taken  up  in  a  subsequent  paper. 
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5.  The  moments  of  a  sum 

5.1.  In  the  present  section,  we  shall  be  concerned  with  the  following  situation 
where  J(  is  definitely  not  convex.  Namely,  let  X  —  Rk  and  take  Ji  to  be  of  the 
form 

(5.1)  M  =  Jl(n)  =  Jtx*  •  •  •  *Jfn. 

Here,  the  star  denotes  convolution  while  the  Mt,  i  =  1,  •  ■  •  ,  n,  denote  given 
collections  of  probability  measures  on  Rk.  In  other  words,  Ji(n)  consists  of  all 
convolutions  of  the  form  y  =  yx*  •  •  •  *yn,  where  yt  G  *  =  1,  *  *  *  ,  n.  It  is 
useful  to  interpret  M{n)  also  as  the  collection  of  all  distributions  y(A)  = 
Pr(Sn  e  A)  of  so  called  admissible  sums  Sn  =  Zx  +  •  •  •  +  Zn  of  n  independent 
random  variables  Z(-  G  Rk  with  the  property  that  the  ith  component  Z,  has  a 
distribution  y(e  —  1,  •  •  •  ,  n. 

Let  q  be  a  fixed  positive  integer  and  assume  that  each  measure  y  e  u 
•  •  •  u  Mn  has  all  moments  yi  =  J  xj  dy  of  order  jj|  ^  q.  Here,  j  =  (j t,  •  •  •  ,  jk) 
denotes  a  multi-index  with  components^',.  eZ+  =  {0, 1,2,***};  further  |  j  |  =  Hjr. 
The  moment  space  Mt  corresponding  to  will  be  defined  as 

(5.2)  Mi  =  e  Rq*  :  there  exists  fi  e  Mi  with  J  xj  dy  =  yj  for  all  j  g  /  j, 

i  =  1,  •  •  •  ,  n.  Here,  J  denotes  the  set  of  all  multi-indices  j  with  1  ^  |j|  ^  q  and 
q*  their  number  (q*  =  q  if  k  =  1 ).  Further,  a  point  y  e  Rq*  is  regarded  as  having 
coordinates  yj  with  j  running  through  /. 

Similarly,  let  M(n)  denote  the  moment  space  corresponding  to  Ji(n).  Thus, 
M(n)  is  also  the  set  of  points  y  =  (yj,j  €  J)  in  Rq*  such  that  there  exists  at  least 
one  admissible  sum  S„  =  Zx  +  •  •  •  -I-  Zn  with  E(SJ„)  =  y3  for  all  j  g  J. 

We  shall  be  interested  in  the  different  relations  between  these  moments 
E(Sj„),j  G  /,  that  is,  in  the  structure  of  M(n) .  In  many  applications  the  class 
and  thus  the  moment  space  Mh  »  =  1,  ••*,»»,  are  convex,  while  nevertheless 
M (")  itself  and  thus  J((n)  are  nonconvex. 

5.2.  In  studying  M(n)  it  is  only  natural  to  use  cumulants  Kj,j  g  J,  since  these 
have  the  addition  property 

(5.3)  Kj(Sn)  =  KjiZj)  +  *  •  •  +  Kj(Zn)  for  all  j  G  /. 

Let  Kt  =  ^[g]  denote  the  cumulant  space  corresponding  to  and  Mt  which 
consists  of  all  points  z  g  Rq*  such  that,  for  some  yeJfi,  we  have  Kj(y)  =  Zj  for 
all  j  G  J.  Let  K(n)  =  K{n)\_q]  denote  the  analogous  cumulant  space  corresponding 
to  Ji{n)  and  M (n).  In  other  words,  z  g  K(n)  if  and  only  if  we  can  find  an  admissible 
sum  S„  with  Kj{Sn)  =  z-}  for  all  j  g  J.  It  follows  from  (5.3)  that 

(5.4)  K(n)  =  Kx  +  •  •  •  +  Kn, 

where  the  addition  on  the  right  side  is  ordinary  addition  of  subsets  of  the  additive 
group  Rq*\  thus,  A  +  B  =  {z:  z  =  a  +  b  for  some  a  e  A,  b  e  B}. 
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From  now  on,  let  us  restrict  ourselves  to  the  special  case  where  My  =  M2  = 

•  •  •  =  Jln  =  M,  say.  Let  M  and  if  denote  the  moment  space  and  cumulant 
space,  respectively,  corresponding  to  the  given  class  M .  It  follows  from  (5.4) 
that 

(5.5)  if(n)  =  K  +  •  •  •  +  K  =  if", 

(in  an  obvious  notation).  We  further  have  the  important  relations 

(5.6)  id  K(n)  =-  Kn  a  conv  (if). 

n  n 

Here,  (1  /n)Kn  =  {z:  nz  e  K" }  may  thus  also  be  regarded  as  the  set  of  all  points 
z  in  Rq*  of  the  form 

(5.7)  *  =  fIKj(S„),j6A 

for  some  admissible  sum  Sn.  Moreover,  z  belongs  to  the  smaller  set  K  in  the 
chain  (5.6)  if  and  only  if  (5.7)  holds  for  some  sum  S„  =  +  •  •  •  4-  Z„  having 

independent  and  identically  distributed  components  Z,. 

The  following  result  due  to  Emerson  and  Greenleaf  ([2],  p.  180)  will  play  an 
important  role. 

Lemma  5.1.  Let  K  be  a  bounded  subset  of  some  Euclidean  space  and  suppose 
that,  for  some  integer p  ^  1 ,  the  set  (1  /p)Kp  has  a  nonempty  interior  relative  to  the 
minimal  flat  &{K)  containing  K. 

Then  there  exists  a  constant  c  >  0  depending  on  K  only  such  that  for  any 
z  e  conv  ( K )  and  any  positive  integer  n  we  have  either  ze(  1  /n)K"  or  z  has  a  distance 
^  c/nfrom  the  complement  of  conv  (if),  ( taken  relative  to  £f(K)). 

5.3.  Consider  the  situation  where  J(  and  thus  K  are  fixed  while  n  is  large. 
By  (5.6),  we  always  have  K(n)  c=  n  conv  ( K ).  It  follows  from  Lemma  5.1  that  in  a 
certain  sense  n  conv  ( K )  is  even  a  very  good  approximation  to  K(n).  For  instance, 
under  the  conditions  of  the  lemma  we  have  that  ( \/n)K(n)  tends  to  conv  (if)  in 
the  Hausdorff  metric. 

Thus,  there  are  several  good  reasons  for  trying  to  determine  conv  (K).  The 
only  situation  which  we  shall  study  in  some  more  detail  is  that  where  k  =  1 
and 

(5.8)  J(  =  {all  probability  measures  on  [0,  c]}. 

Here,  c  denotes  a  fixed  positive  constant.  In  other  words,  we  shall  be  concerned 
with  the  moment  space  M(n)  =  M(n)  [g]  and  the  cumulant  space  K(n)  =  if (n)  [</] 
corresponding  to  the  set  of  all  sums  S„  =  Zx  +  •  •  •  +  Z„  of  real  valued  inde¬ 
pendent  random  variables  Zi  with  possibly  different  distributions,  but  such  that 
0  ^  Zf  c,  i  =  1,  •  •  •  ,  n.  Also  recall  that  k^S)  =  E(S)  =  m,  say;  k2(S)  = 
E(S  —  m)2  =  Var($);  k3(S)  =  E{S  —  m)3,  while  k^S)  =  E{S  —  m)  — 
3[if($  —  m)2]2. 
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5.4,  It  is  well  known  ([4],  p.  106)  that  the  moment  space  M [g]  corresponding 
to  Jt  is  precisely  the  convex  hull  of  the  curve 

(5.9)  {y :  y  —  (t,  t2,  •  •  •  ,  tq)  for  some  0  ^  t  ^  c}. 


Moreover,  y  =  (yl,---,yq)eEq  belongs  to  M\c[\  if  and  only  if  ajPj  ^  0 
for  every  polynomial  £$  djXj  which  is  nonnegative  on  [0,  c] ;  here,  and  in  the 
sequel,  y0  =  1.  The  latter  condition  can  be  replaced  by  a  small  number  of  poly¬ 
nomial  inequalities  in  yl9  •  •  •  ,  yq,  involving  so  called  Hankel  determinants. 

For  low  values  of  q  these  conditions  are  as  follows.  First  0  ^  yx  ^  c  if  g  ^  1 ; 
moreover,  y\  ^  y2  ^  cyx  if  q  ^  2 ;  moreover, 


(5.10) 


2/1 

2/2 

^  0, 

2/2 

2/3 

c  ~  Vi 
cyi  -  2/2 


cy i  -  2/2 

W2  ~  2/3 


>  o 


if  q  ^  3 ;  moreover, 


(5.11) 


1 

2/1 

2/2 

2/1 

2/2 

2/3 

2/2 

2/3 

2/4 

cy  i  -  y  2 

W2  ~  2/3 


if  q  ^  4,  and  so  on. 
Next,  we  may  write 


cy 2  ~  2/3 
C2/3  -  2/4 


^  0 


(5.12) 


K[_q]  =  UM[ql 


Jf(n)[g]  =  U~\K[q\n), 


where  U  denotes  the  usual  one  to  one  transformation  of  moment  points 
y  —  (2/i-  *  *  ‘  >  yq)  into  cumulant  points  K  =  Uy  =  (fq,  •  •  •  ,  Kq).  This  transform¬ 
ation  is  defined  by  the  formal  power  series  identity 


(5.13) 
hence, 

(5.14) 
and 


K:  = 


(^2/)i  =  I(-l)h- 


h\j\ 


I'-'-r,'. 


(5.15) 


yj  =  =  I 


j  =  1,  •  •  •  ,  q.  Here,  each  summation  extends  over  all  the  g-tuples  (fq ,  •  •  •  ,  r7)  e 
Zj+  such  that  =  j.  Further,  h  =  —  1  +  S>,-. 

5.5.  Let  us  first  consider  the  case  q  =  2,  the  case  q  =  1  being  trivial.  By 
Ki  =  2/i  and  k2  =  y 2  —  2/1  >  the  set  -K"[2]  is  defined  by 


(5.16) 


0  ^  Kj  ^  C, 


0  <  K2  =  Kl(c  —  Kl)- 
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It  so  happens  that  K[2]  is  convex.  Hence,  the  set  K\2\n  of  all  possible  points 
(jq ($„),  k2(S ))  coincides  with  nK[ 2].  Thus,  0  ^  E(Sn)  ^  nc  and 


(5.17) 


Var  (Sn)  ^  E(S , 


J  c  -  -E(Sn)  , 
n 


and  these  inequalities  cannot  be  improved. 

5.6.  Let  us  now  turn  to  the  case  q  =  3.  The  lower  bound  (5.10)  on  y3  is  of 
the  form 

(5.18)  y2  =  EZ2  =  £(Z‘'2-Z3'2)  g  (y,y3)112 

It  is  valid  whenever  Z  ^  0  and  is  attained  if  and  only  if  p  e  M  has  a  two-point 
support  {0,  with  £  ^  0.  In  terms  of  cumulants  this  becomes 


(5.19) 
thus, 

(5.20) 


y\  =  (*1  +  k2)2  ^  kx(k\  +  3kxk2  +  k3); 


k3  ^  f(Ki9K2)  =  ( k2  -  K2t) 


The  upper  bound  (5.10)  on  y3  takes  the  form 

(5.21)  k3  ^  g(Kx,  k2)  =  ((c  -  kx)2  -  /q)^ ^ 

(and  is  assumed  if  and  only  if  p  e  M  has  a  two-point  support  {<!;,  c}).  We  con¬ 
clude  that  iT[3]  may  also  be  described  as  the  set  of  points  (kx,  k2,  k3)  in  R 3 
satisfying  (5.16),  (5.20),  and  (5.21). 

Lemma  5.2.  The  if  [3]  is  not  convex.  Moreover,  K*  =  convif[3]  is  equal 
to  the  convex  hull  of  the  curve 

(5.22)  n  =  {Ylp  =  ( p ,  p(c  -  p ),  p(c  -  p)(c  -  2 p))\  0  ^  c}. 

It  follows  that  (zx,  z2,  z3)  e  K$  if  and  only  if  (zx,  z2)  €  K[2],  ( that  is,  0  ^  z2  = 
zx(  1  —  zx))  and,  moreover, 


(5.23) 


—  cz2  H - ^  z3  ^  cz2 - - - , 

z^  c  —  Z i 


(if  0  <  z1  <  c  ;  otherwise  z2  =  z3  =  0). 

Remark  5.1.  The  z3  projection  of  K$  is  nothing  but  K[2]  =  iff.  The  zx 
projection  is  easily  seen  to  be  given  by 

(5.24)  0  ^  z2  ^  \c2,  |z3|  ^  z2[c2  -  4z2]1/2. 

The  z2  projection  is  found  to  be 

—  %c2zx  ^  z3  ^  zx(c  —  z1)(c  —  2 zx),  if  0  ^  zx  ^  jc; 


(5.25) 


-%c2zx  ^  z3  ^  £c2(c  -  zx), 


if  \c  zx  ^  fc; 


—  zx(c  —  zx)(2zx  —  c)  ^  z3  ^  £c2(c  —  zx),  if  \c  ^  zx  ^  c. 
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Proof  of  Lemma  5.2.  If  if  [3]  were  convex,  then  /  would  be  convex  and  g 
would  be  concave  on  the  domain  if[ 2]  defined  by  (5.16).  On  the  contrary,  con¬ 
sider  the  lower  boundary  of  if  [3]  on  the  cross  section  k2  =  (c  —  p)Ky,  where 
0  <  p  ^  c,  0  ^  /q  ^  p.  It  is  given  by 

(5.26)  k3  =  /(/q,  (c  -  ppq)  =  (c  -  p)/q(c  -  p  -  Kt). 

The  latter  function  is  strictly  concave  (  as  soon  as  p  <  c)  instead  of  convex.  We 
also  conclude  that  in  forming  the  convex  hull  iff  of  if  [3]  the  lower  boundary 
of  the  cross  section  may  as  well  be  replaced  by  the  pair  of  endpoints,  namely, 
the  point  II0  corresponding  to  iCj  =0  and  FIp  corresponding  to  Kj  =  p. 

Similarly,  the  upper  boundary  of  if  [3]  in  its  cross  section  with  k2  =  p(c  —  jq), 
0^p^c,p^?q  ^  c,  is  given  by  the  convex  function 

(5.27)  k3  =  p(/q,  p (c  -  /q))  =  p(c  -  »q)(c  -  kx  -  p). 


In  forming  iff,  this  part  of  the  upper  boundary  may  be  replaced  by  the  pair  of 
endpoints  Flp  and  ric. 

It  follows  that  indeed  iff  is  precisely  equal  to  the  convex  hull  of  the  curve  n 
described  in  (5.22).  The  linear  transformation 

(5.28)  yi  =  zlf  y2  =  czt  -  z2,  y3  =  c2zx  -  \z2  +  \z3 


sends  the  point  IIp  into  a  point  y  with  coordinates  y}  =  pj,  j  =  1,2,3.  Thus,  it 
sends  iff  onto  the  corresponding  convex  hull  which  happens  to  be  M [3]  (see 
(5.9)).  The  latter  is  determined  by  the  inequalities  y\  ^  y2  ^  cyx  and  (5.10). 
Transforming  back,  we  conclude  that  iff  is  determined  by  the  inequalities 

0  ^  z2  =  zi(c  ~  zi)  and  (5.23). 

Theorem  5.1.  If  Sn  is  a  sum  of  n  independent  random  variables  0  ^  ^  c, 

then  Zj  =  ( l/n)Kj(S„ ),  j  =  1,  2,  3,  defines  a  point  of  iff.  Hence,  we  have,  besides 
(5.17),  that 


(5.29) 


2  Var  (Sn)  <  k3{S„)  ^  2  Var  {Sn) 

C  ES„  =  Var  (S„)  ~  n  -  E8n‘ 


Thus,  k3(S„)  >  0  as  soon  as  Var  ($„)  >  jc(ESn).  Moreover,  by  (5.24), 


(5.30) 


\k3(8h) |  ^  Var  (Sn) 


"Var  (Sn) 
n 


The  inequalities  (5.29)  are  sharp  in  the  following  sense.  Let  z  =  (zy,  z2,  z3)  be 
a  given  point  in  int  (iff).  Then  for  n  sufficiently  large  there  exists  an  admissible 
sum  Sn  with  ( l/n)Kj(Sn )  =  Zj,j  =  1,  2,  3. 

Proof.  Combine  (5.6),  Lemma  5.1,  and  Lemma  5.2. 

Remark  5.2.  The  point  Ilp  is  realized  by  the  cumulants  of  the  measure  p 
with  support  (0,  c}  and  mass  p/c  at  the  point  c.  Thus,  FI  c  if[3] ;  hence, 
(1/n)  n"  c:  (1/w)k;[3]"  and  each  tends  to  iff  in  the  sense  of  Lemma  5.1. 

It  can  be  shown  that  to  each  point  z  e  int  (iff )  there  corresponds  an  integer 
n0  such  that  for  each  integer  n  ^  »0  there  exists  a  representation  of  z  as  z  = 
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Zjl= !  ( njn )  Ilph  with  nh  e  Z  +  ,  X?  nh  =  a,  0  ^  ph  ^  c,  (all  depending  on  n  and  2, 
h  =  1,  2,  3).  This  implies  that  2  can  be  realized  as  2 j  =  ( l/n)Kj(S„),j  =  1,2,  3,  by 
a  sum  of  independent  random  variables  Sn  =  Zx  +  •  •  •  +  Zn  such  that  each  Zx 
takes  only  the  values  0  and  c,  while  Pr(Zi  =  c)  =  ph/c  for  exactly  nh  indices 
i  =  1,  •  •  •  ,  n,  h  =  1,  2,  3. 

5.7.  Let  us  finally  consider  the  case  q  =  4,  restricting  ourselves  to  a  lower 
bound  on  ka(S„). 

If  Z  is  any  random  variable  with  a  finite  fourth  moment,  then  the  best  lower 
bound  for  the  cumulant  k4  =  ka(Z)  in  terms  of  the  lower  cumulants  is  given  by 


(5.31) 


k\ 

*2 


2  k\. 


If  0  5s  Z  ^  c,  then  (5.31)  follows  immediately  from  the  first  inequality  (5.11). 
Since  the  validity  of  (5.31)  is  not  affected  by  a  change  of  location  or  scale,  it 
follows  that  (5.31)  holds  for  each  bounded  Z  and  thus  for  each  Z  with  a  finite 
fourth  moment. 

The  argument  further  shows  that  K[ 4]  can  be  defined  by  the  inequalities 
(5.16),  (5.20),  (5.21),  (5.31),  and  a  somewhat  more  complicated  upper  bound  on 
k4  which  may  be  derived  from  the  second  inequality  (5.11)  (and  which  will 
involve  c). 

Lemma  5.3.  The  lower  boundary  of  K%  =  conv  if  [4]  is  the  same  as  the  lower 
boundary  of  the  convex  hull  of  the  curve  X  =  {Xp;  0  ^  p  ^  c},  where 

(5.32)  Xp  =  (p,  p{c  -  p),  p(c  -  p){c  -  2 p),  p(c  -  p)(c2  -  Qcp  +  6 p2)). 


Moreover,  z  €  R*  belongs  to  K*  if  and  only  if: 

(i)  (2j,  22,  23)  g  K$ ;  that  is,  0  ^  z2  ^  Zi(l  —  2J  and  (5.23)  hold  ; 

(ii)  we  have  the  lower  bound 


(5.33) 


^  1  2 

24  2!  - —  C  2- 


22- 


(iii)  24  satisfies  an  analogous  upper  bound  zA  ^  h{zx ,  z2,  z3)  which  will  not  be 
specified. 

Remark  5.3.  We  shall  need  to  relate  the  inequalities  (5.33)  and 


(5.34) 


z2 

z3 


z4  2: - 222, 

22 


where  22  >  0.  In  fact,  (5.34)  implies  (5.33)  precisely  when 


(5.35) 


Zl-2z2 


z2 


32? 
2  22 


2  ’ 


which  is  equivalent  to  the  pair  of  inequalities 

(5.36)  22  ^  ic2,  1 23 1  ^  22[c2  -  422]1/2. 
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By  (5.24)  this  is  true  if  and  only  if  (z2,  z3)  corresponds  to  some  point  z  = 
(zl5  z2,  z2)  e  K%.  In  all  other  cases,  in  particular  when  z2  >  *c2,  we  have  that 
(5.33)  implies  (5.34). 

Proof  of  Lemma  5.3.  The  vertical  projection  (in  the  k4  direction)  of  K[ 4] 
is  precisely  K [3].  Hence,  the  vertical  projection  of  conv  if  [4]  =  if*  is  precisely 
conv  if  [3]  =  Kf  and  the  latter  is  completely  described  by  Lemma  5.2. 

Consider  the  curve  E  described  by  (5.32).  We  easily  verify  that  the  point  Ep 
of  the  curve  is  realized  by  the  first  four  cumulants  of  the  probability  measure  pp 
having  the  two-point  support  {0,  c}  and  a  mass  p/c  at  c.  Hence,  E  c:  if  [4]  and 
therefore,  conv  (E)  c=  conv  if  [4]  =  if*.  By  Lemma  5.2,  the  vertical  projection 
of  conv  (E)  is  precisely  conv  (FI)  =  if* . 

For  each  point  z  =  (zt,  z2,  z3 )  in  K* ,  let  us  define 

(5.37)  0(z)  =  inf  {£ :  {z1,z2,  z3,  Qe^} 
and 

(5.38)  «A(z)  =  inf  {£ :  (z2,  z2,  z3,  ()  €  conv  (E)}. 

Clearly,  both  <f)  and  are  convex  functions  on  K$.  Further,  (f>(z)  ^  i l/(z)  since 
conv  (E)  c=  K*.  We  shall  prove  below  that 

z1 

(5.39)  \)/(z)  =  §  —  —  \c2z2  for  all  z  eif*, 

z2 

and  it  would  suffice  to  prove  that  (f>  =  \f/. 

In  fact,  we  know  from  (5.31)  that,  for  each  zeif[4],  z4  ^  zl/z2  —  2 z\. 
Using  formula  (5.39)  and  Remark  5.3,  we  conclude  that  z4  ^  \J/(z1,  z2,  z3)  for 
each  z  e  *[ 4],  and  hence,  for  each  z  €  conviC[4]  =  if*  since  the  function  ^  is 
convex.  This  in  turn  implies  that  (f)(z)  ^  ij/{z),  and  hence,  (f)(z)  =  i p(z)  for  all 
zeK$. 

It  only  remains  to  verify  the  formula  (5.39)  for  the  function  if/  defined  by 
(5.38).  One  proof  would  be  to  derive  it  from  the  second  inequality  (5.11)  by  a 
transformation  analogous  to  the  one  used  in  the  last  part  of  the  proof  of  Lemma 
5.2.  Another  proof  would  be  as  follows. 

First,  introduce  i J/(z)  =  \z\/z2  —  jc2z2.  Then  i jr  is  convex  throughout  the 
region  z2  >  0,  as  can  for  instance  be  seen  from  the  formula 

(5.40)  $(z)  =  sup  [— i(3«2  +  c2)z2  +  3mz3]. 

u 

It  follows  that  the  region  W  =  {z  e  i?4:  z'  e  if*,  z4  ^  $(zf)}  is  convex;  z'  = 
(*i,  z2,  Z3)  =  (zi,  z2,  z3,  z4). 

Second,  an  easy  computation  shows  that 

(5.41)  z4  =  1 J/{z')  for  each  z  £  E; 

\]/ (Ep)  =  0  if  p  =  0  or  p  =  c.  Hence,  E  c ~  W,  and  hence,  conv  (E)  c  W\  thus, 
1 j/(z)  ^  \j/(z)  throughout  Af . 
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Third,  if  2  e  X  then,  using  (5.38)  and  (5.41),  we  have  that  if/ (2')  ^  z4  =  1 J/(z'); 
thus,  1 J/(z’)  =  $(2'). 

Fourth,  on  the  line  segment  z3  =  Xz2 ,  |A|  <  c,  zeK$,  the  function  1 J/(z)  is 
linear  in  z2  while  1 J/(z)  is  convex  so  that  the  difference  %(z)  =  \J/(z)  —  \j/(z)  is 
convex  and  nonnegative.  Moreover,  %(z)  =  0  at  both  end  points  (which  corres¬ 
pond  to  points  of  £),  and  hence,  throughout  the  entire  segment.  This  proves 
that  1 j/(z)  =  i p(z)  throughout  K*  and  establishes  (5.39). 

Theorem  5.2.  Suppose  Sn  =  Zl  +  •  •  •  +  Zn  is  the  sum  of  n  independent 
random  variables  0  ^  Z,  ^  c.  Then 


(5.42) 


k4(£J  ^ 


3^ 

2  k2{S„) 


I c2k2{S„ ). 


The  inequality  (5.42)  is  sharp  in  the  sense  that  given  z  e  int  ( K * )  {in  particular 
z4  >  izl/z2  ~  Wz2 )  there  exists  for  each  sufficiently  large  integer  n  an  admissible 
sum  Sn  with  (1  /n)Kj(S„)  =  Zj.  j  —  1,  2,  3,  4. 

Proof.  Combine  (5.6),  Lemma  5.1,  and  Lemma  5.3. 

Remark  5.4.  In  view  of  (5.31),  we  also  have 

(5.43)  k4(S„ )  &  K3f"\  -  2 k2(S„)2. 

k2  (o„) 

As  follows  from  Remark  5.3,  (5.43)  is  better  than  (5.42)  if  and  only  if 
[k2{S„,  k3{S„))  corresponds  to  a  point  (2 j,  k2(S„),  k3(S„))  of  K*  =  convA[3], 
for  some  choice  of  z1.  For  n  large  this  is  rarely  the  case.  If  k2(S„)  is  large,  then 
(5.42)  is  obviously  much  more  precise  than  (5.43). 


6.  Exponential  bounds 

6.1.  In  this  section  we  shall  again  be  interested  in  a  sum  S„  =  Zx  +  •  •  •  +  Zn 
of  independent  real  valued  random  variables.  We  shall  make  two  assumptions. 

(i)  Let  a  <  b  be  given  finite  constants  and  assume  that  a  ^  Zi  ^  b  for  all 
i  =  !,•••,». 

(ii)  Let  m  be  a  given  positive  integer  and  cl5  •  •  •  ,  cm  given  numbers.  We 
assume  that 

(6.1)  E{Z{)  =  Cj  for  j  =  1,  •  •  •  ,  m  and  all  i  =  1,  •  •  •  ,  n. 

Naturally,  cl,  •  •  •  ,  cm  must  be  such  that  there  do  exist  such  random  variables 
a  ^  Zt  ^  6;  that  is,  c  =  (cx,  •  •  •  ,  cm)  must  be  a  point  of  the  corresponding 
moment  space  M  [m] . 

Most  of  the  difficulties  encountered  in  Section  5  concerning  the  precise  rela¬ 
tions  between  the  first  q  moments  of  Sn  had  to  do  with  the  fact  that  the  trans¬ 
formation  (5.14)  between  moments  and  cumulants  is  a  nonlinear  transformation 
which  may  transform  a  convex  set  M\_q~\  into  a  nonconvex  set  K\_q\. 

In  the  present  section,  we  want  to  exploit  the  fact  that  Kj  happens  to  be  linear 
in  the  higher  moments  y}.  Fixing  the  lower  moments  as  in  (6.1)  will  make  the 
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cumulants  Kj(Zi)  with  j  ^  2m  +  1  into  an  affine  function  of  the  unknown 
moments  yr(Zf)  =  E(Zri).  More  precisely,  (5.14)  and  (6.1)  together  imply  a 
relation  of  the  form 

(6.2)  Kj(Zi)  =  dj  +  X  bj,rVr{Zi ),  j  =  m  +  1,  •  •  •  ,  2m  +  1, 

r  =  m+  1 

valid  for  all*  =  1,  •  •  •  ,  n.  Here,  the  coefficients  aj  and  bjr  are  known  constants 
(depending  only  on  clt  •  •  •  ,  cm). 

The  collection  M  of  possible  moment  points  y  =  {ym  +  i{Zi),  •  •  *  ,  yim+ii^d) 
is  a  known  convex  subset  of  Rm+1.  Hence,  from  (6.2),  also  the  collection  K  of  all 
possible  cumulant  points  z  —  (Km+1(Z,),  •  •  •  ,  K2m+1(Zi))  is  a  known  convex 
subset  if  of  Rm+ \  the  same  set  for  alii  =  1,  •  •  •  ,  n.  By  (5.3),  the  set  of  all  possible 
cumulant  points  (Km+1(S„),  •  •  •  ,  K2m  +  1(S„))  of  Sn  is  precisely  the  set  K  +  K  + 

•  •  •  +  K  =  Kn  =  nK,  the  latter  because  K  is  convex.  This  leads  to  the  statement 

(6.3)  \-Km+1(Sn),  •  •  •  ,-K2m+l{Sn))<=K. 

\n  n  ) 

And  for  each  integer  n  this  is  more  or  less  all  we  can  say  about  these  cumulants, 
that  is,  about  the  moments  E{8{)  with  j  ^  2m  +  1;  the  transformation  (6.2) 
cannot  be  used,  however,  for  Z{  replaced  by  Sn,  but  has  to  be  modified  since  Sn 
has  its  lower  moments  different  from  Z{. 

Suppose  Zu  •  •  •  ,  Zn  besides  satisfying  (i),  (ii)  also  satisfy: 

(iii)  the  Z{  are  identically  distributed. 

Then 

(6.4)  (—  Km+1(Sn),  '  •  •  ,  —  K2m+i  (8n)  J  =  (Km+  i{Zt),  ,  K2m+  l(-^l))> 

\n  n  J 

but  still  the  latter  can  be  any  point  in  K.  In  other  words,  the  relation  (6.3)  cannot 
be  improved  at  all.  That  is,  any  relation  between  the  moments  E{8{)  with 
l  2m  +  I  which  is  universally  true  under  the  assumptions  (i),  (ii),  and 

(iii)  is  thus  also  universally  true  under  the  assumptions  (i)  and  (ii)  alone. 

6.2.  Let  us  now  consider  the  following  related  problem.  Namely,  we  still 
assume  (i)  and  (ii),  but  we  now  want  to  add  the  assumption: 

(iv)  let  q  be  a  given  positive  integer  with  m  ^  q  ^  2m  +  1  and  assume  that 

(6.5)  E(SJn)  =  dj  for  all  m  <  j  ^  q, 

where  the  dj  are  given  numbers.  Naturally  (6.5)  is  void  when  q  =  m.  Since  (i), 
(ii),  and  (iv)  must  have  a  common  solution,  the  dj  cannot  be  entirely  arbitrary. 
Note  that  the  distributions  of  the  Zt  will  be  different  in  general. 

We  shall  show  that  in  this  situation  it  is  not  difficult  to  obtain  an  exact  formula 
(in  terms  of  a,  6,  the  Cj  and  dj)  for  the  quantity 

(6.6)  A{t)  =  sup  log#(exp  {tSn}), 

where  S„  ranges  through  all  sums  of  the  above  type,  while  t  denotes  a  fixed 


124 


SIXTH  BERKELEY  SYMPOSIUM:  KEMPERMAN 


constant,  t  =/=  0.  Such  an  exact  formula  may  be  useful  in  connection  with  the 
well-known  inequality 

(6.7)  Pr(Sn  ^  nx)  ^  exp  {  —  tnx]  E(ex p  {*$„})  ^  exp  {  —  tnx  +  A(t )} 

if  /  >  0. 

6.3.  Let  us  first  reformulate  the  above  problem.  In  the  first  place,  we  may 
rewrite  (6.6)  as 

(6.8)  A(t)  =  sup  £  log  i?(exp  {<Z,}), 

Zl . z"i=i 

where  (Z1}  •  •  •  ,  Z„)  ranges  through  the  n -tuples  of  random  variables  satisfying 
(i),  (ii),  while  further  £"=1  Kj{Z{)  is  equal  to  a  given  number  when  m  <  j  ^  q. 
Using  (6.1),  (6.2),  and  q  ^  2m  +  1,  the  latter  condition  can  be  rewritten  as 

(6.9)  Yj  E(Z{)  =  ej  f°r  all  m  <  j  ^  q, 

i  =  1 

where  the  ej  denote  given  numbers,  which  are  easily  calculated  from  the  c,  and 
dj.  We  now  have  that  in  (6.8)  the  Z{  range  through  the  w-tuples  satisfying  (i),  (ii) 
and  (6.9) ;  in  the  present  formulation  the  independence  of  the  Z,  is  no  longer 
important.  Let  us  now  proceed  to  show  that  the  supremum  in  (6.8)  is  attained 
for  the  case  where  the  Z{  are  identically  distributed.  This  will  reduce  our  problem 
to  the  more  or  less  classical  one  of  finding 

(6.10)  A{t)  =  n  sup  log  i?(exp  {tZ}), 

where  Z  is  a  random  variable  subject  only  to  the  conditions  that  a  Z  ^  b 
and  that  E(Zj)  =  Cj,  j  =  1,  •  •  •  ,  q,  with  cx,  •  •  •  ,  cq  as  given  numbers. 

6.4.  The  above  problem,  to  determine  the  maximum  (6.8)  subject  to  the 
conditions  a  ^  Zt  b,  (6.1),  and  (6.9),  may  be  generalized  as  follows. 

Namely,  consider  a  measurable  space  X  and  a  given  convex  class  M  of  prob¬ 
ability  measures  on  X.  Let  further  \J/  and  gj,j  =  1,  •  •  •  ,  r,  be  given  real  valued 
measurable  functions  on  X  which  are  integrable  relative  to  each  p  e  M .  Finally, 
let  (f)  be  a  given  real  valued  and  concave  function  defined  on  an  interval  con¬ 
taining  the  range  of  x//.  Our  problem  will  be  to  determine 

(6.11)  max  f  ^(i/jiZi))]. 

i=  1 

Here,  Z1,  •  •  •  ,  Zn  will  denote  random  variables  taking  values  in  X  such  that 
(a)  the  distribution  pt  of  Z(  belongs  to  the  given  class  M,  i  =  1,  •••,«;  (b)  the 
Z(  must  further  satisfy  the  side  conditions 

(6.12)  Y  E{gj(Zi))  =  ei  forj  =  1,  •  •  ,  r. 

i  =  1 

Here,  the  e7  denote  given  numbers  such  that  there  do  exist  n-tuples  of  random 
variables  satisfying  the  stated  properties. 


MOMENT  PROBLEMS 


125 


6.5.  In  our  original  problem  we  have  X  =  [a,  6].  Further,  M  may  be  taken 
as  the  class  of  all  probability  measures  on  X  satisfying  j  xj  p(dx)  =  cp  j  = 
1,  •  •  •  ,  m.  Moreover,  in  (6.12)  we  take  r  =  q  —  m  and  gj(x)  =  xm+j.  Finally, 
< p(u )  =  log  u  and  i f/(x)  =  etx.  Observe  that  log  u  is  indeed  concave  on  the  range 
(0,  oo)  of  \J/. 

Lemma  6.1.  In  the  above  general  problem  of  determining  (6.11),  the  supremum 
is  not  decreased  by  adding  the  additional  condition  that  Zx,  •  •  •  ,  Zn  all  have  the 
same  distribution. 

Proof.  Let  Zf  have  distribution  pt  e  M,  t  =  1 ,  •  •  •  ,  »,  and  put  p  = 
(p !  +  •••  +  pn)/n ■  Then  p  e  M  since  M  is  convex.  It  suffices  to  prove  that 


(6.13)  i  £  </>(f  \l/(x)Pi(dx^j  ^  <t>(^j  \l*(x)p{dx)j. 


But  this  follows  immediately  from  Jensen’s  inequality. 

6.6.  Let  us  return  to  the  original  problem  (6.6).  Applying  Lemma  6.1 ,  we  con¬ 
clude  that  (6.8)  may  be  reformulated  as  in  (6.10).  Now  observe  that  the  q  +  2 
functions  gj(x)  =  xj,  j  =  0,  1,  •  •  •  ,  q,  and  gq+\{x)  =  etx  together  form  a 
Chebyshev  system  over  every  subinterval  of  R,  (see  [7]  p.  45,  [4]  pp.  6  and  376). 
It  follows  from  a  classical  result  due  to  Markov  (see  [7] ,  p.  61 ,  [4] ,  pp.  55  and  80) 
that  the  supremum  (6.10)  is  attained  for  either  the  upper  or  the  lower  principal 
representation  (depending  on  the  sign  of  t)  corresponding  to  the  preassigned 
moment  point  y  =  (ct ,  •  •  • ,  cq),  (provided  y  e  int  (Af  [g])  which  we  shall  assume). 

Here,  by  a  principal  representation  we  mean  a  probability  measure  on  [a,  6] 
having  the  preassigned  moments  ct,  •  •  •  ,  cq  and  further  a  finite  support  S  con¬ 
sisting  of  %{q  +  1)  points  (an  end  point  a  or  6  counting  only  as  half  a  point). 
There  are  exactly  two  such  principal  representations  of  y.  Observe  that  the  pair 
of  principal  representations  is  totally  independent  of  t.  Assuming  that  t  >  0, 
the  principal  representation  maximizing  J  etxp(dx)  always  has  the  right  end  point 
b  in  its  support  and  this  requirement  uniquely  determines  the  principal  repre¬ 
sentation  needed;  if  /  <  0  then  the  left  end  point  a  must  be  in  the  support. 

Theorem  6.1.  Let  Sn  —  Zr  +  •  •  •  +  Z„,  where  Zy ,  •  •  •  ,  Z„  are  independent 
random  variables  with  possibly  different  distributions  such  that  |Z,|  ^  1  and 
E{Zi)  =  0,  *  =  1,  •  •  • ,  n.  Put  Var  ( Sn )  =  s2  =  nc,  0  ^  c  ^  1,  and  let  t  >  0  be 
a  given  constant.  Then 


(6.14) 


E{etSn)  ^ 


c 

1  +  C 


ef  + 


1 

1  +  c 


and  this  inequality  cannot  be  improved. 

Proof.  We  must  compute  (6.6)  subject  to  |Z,|  ^  1,^(Z,)  =  Oand E(S*)  = 
nc.  This  is  a  very  special  case  of  the  general  problem  (6.6).  We  already  showed  that 
we  may  assume  the  Z(  to  be  identically  distributed  ;  hence,  we  must  prove  that 

(6.15)  E{ea)  ^  e>  +  — f-  <rn, 

1  +  c  1  +  c 


when  it  is  known  that  |  Z|  ^  1  ,E(Z)  —  0,E(Z2)  =  s2/n  =  c.  The  largest  possible 
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value  E(etZ )  must  be  attained  by  the  principal  representation  corresponding  to 
the  moments  yx  =  0,  y2  =  c  with  a  support  S  of  size  §  and  such  that  1  e  8 ;  we 
have  (y1,y2)eint  d/[ 2]  provided  0  <  c  <  1;  the  relation  (6.15)  is  obvious 
when  c  =  Oorc  =  1 .  It  follows  that  necessarily  S  =  {  — c,  1},  while  Pr(Z  =  1)  = 
c/(  1  +  c).  This  proves  (6.15).  If  each  Z,  has  the  latter  distribution,  then  (6.15) 
holds  with  the  equality  sign  and  hence  cannot  be  improved. 

6.7.  As  a  final  application,  suppose  Sn  =  Zt  +  •  •  •  +  Z„  is  a  sum  of 
independent  random  variables  such  that  a  ^  Zt  ^  b  and  E(Zt)  =  0,  i  =  1,  •  •  •  ,  n; 
here,  a  <  0  <  b  are  fixed.  We  want  to  establish  the  best  possible  upper  bound  on 
i£(exp  {*$„}),  t  >  0,  in  terms  of  E(SZ)  =  s2  =  nc  and  E(S 3)  =  p3  =  nd ,  say. 
There  does  exist  a  random  variable  Z  with  a  ^  Z  ^  b&ndE(Z)  =  0 ,EZ2  =  c, 
EZ 3  =  d  (see  (6.3)).  Moreover,  from  (6.10)  and  Section  6.6, 

(6.16)  sup  F(exp  {bS„})  =  (E  exp  {tZ})n, 

where  a  ^  Z  ^  b  has  as  its  distribution  the  principal  representation  of  the  set 
of  moments  yx  =  0  ,y2  =  c,y2  =  d  and  with  b  in  the  support  S ;  here,  we  assume 
that  y  is  interior.  Further,  the  support  has  ^(3  +  1)  =  2  points  (counting  end 
points  half),  implying  that  it  is  of  the  form  S  =  (a,  £,  6}  with  a  <  £  <  b.  This 
leads  to  the  equations  paj  +  q£j  +  rbj  =  yj,  j  =  0,  1,  2,  3,  y0  =  1,  yx  =0, 

Vi  ~  c>  2/3  =  d,  so  that  £  =  (d  —  ac  —  bc)/(ab  +  c).  The  desired  optimal  upper 

bound  is  given  by 

(6.17)  ^(exp  { tS„ })  ^  (peat  +  qe ^  +  rebt)n, 
which  is  easily  computed.  If  desired,  we  can  also  derive  a  bound 

(6.18)  Pr(S„  ^  nx)  ^  \e~xt{peat  +  qeil  +  rebt))Y, 

provided  t  >  0,  a  <  x  <  b.  The  best  value  of  t  is  obtained  by  putting  the 
logarithmic  derivative  with  respect  to  t  equal  to  0.  In  the  special  case  £  = 
j  (a  +  b),  this  leads  to  a  simple  quadratic  equation;  this  happens  when 

d  =  (a  +  b)c,  d  =  0  if  a  =  —6. 
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